Please draw your own subjective distributions for the following events.

  1. The probability that it will snow at Reed this winter.
  2. The probability that, on a given night, the sun has gone super nova.
  3. The total number of individual socks that you own.

Karl Broman's Socks

Classical H test

Assert a model

\(H_0\): I have \(N_{pairs}\) pairs of socks and \(N_{singles}\) singletons. The first 11 socks that I pull out of the machine are a random sample from this population.

Decide on a test statistic

The number of singletons in the sample: 11.

Construct the sampling distribution

Probability theory or simulation.

See where your observed stat lies in that distribution

Find the p-value if you like.

\(H_0\)

\[N_{pairs} = 9\]

\(H_0\)

\[N_{pairs} = 9; \quad N_{singles} = 5\]

Contructing the sampling dist.

We'll use simulation.

Create the population of socks:

sock_pairs <- c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K")
sock_singles <- c("l", "m", "n", "o", "p")
socks <- c(rep(sock_pairs, each = 2), sock_singles)
socks
##  [1] "A" "A" "B" "B" "C" "C" "D" "D" "E" "E" "F" "F" "G" "G" "H" "H" "I"
## [18] "I" "J" "J" "K" "K" "l" "m" "n" "o" "p"

One draw from the machine

picked_socks <- sample(socks, size = 11, replace = FALSE)
picked_socks
##  [1] "o" "D" "A" "F" "E" "F" "l" "K" "K" "m" "H"
sock_counts <- table(picked_socks)
sock_counts
## picked_socks
## A D E F H K l m o 
## 1 1 1 2 1 2 1 1 1
n_singles <- sum(sock_counts == 1)
n_singles
## [1] 7

Our simulator

Constructing the sampling dist.

pick_socks(N_pairs = 9, N_singles = 5, N_pick = 11)
## [1] 9
pick_socks(9, 5, 11)
## [1] 7
pick_socks(9, 5, 11)
## [1] 7

Repeat many, many times…

The sampling distribution

The sampling distribution

The p-value

Quantifying how far into the tails our observed count was.

table(sim_singles)
## sim_singles
##   1   3   5   7   9  11 
##   2  48 248 411 250  41
table(sim_singles)[6]/1000
##    11 
## 0.041

Our two-tailed p-value is 0.082.

Question

What is the best definition for our p-value in probability notation?

  1. P(\(H_0\) is true | data) = 0.041
  2. P(\(H_0\) is false | data) = 0.041
  3. P(\(H_0\) is true) = 0.041
  4. P(data | \(H_0\) is true) = 0.041
  5. P(data) = 0.041

Question

What is the best definition for our p-value in probability notation?

  1. P(\(H_0\) is true | data) = 0.041
  2. P(\(H_0\) is false | data) = 0.041
  3. P(\(H_0\) is true) = 0.041
  4. P(data | \(H_0\) is true) = 0.041
  5. P(data) = 0.041

The challenges with the classical method

The result of a hypothesis test is a probability of the form:

\[ P(\textrm{ data or more extreme } | \ H_0 \textrm{ true }) \]

while most people think they're getting

\[ P(\ H_0 \textrm{ true } | \textrm{ data }) \]

How can we go from the former to the latter?

What we have

What we want

Bayesian Modeling

Bayes Rule

\[P(A \ | \ B) = \frac{P(A \textrm{ and } B)}{P(B)} \]

\[P(A \ | \ B) = \frac{P(B \ | \ A) \ P(A)}{P(B)} \]

\[P(model \ | \ data) = \frac{P(data \ | \ model) \ P(model)}{P(data)} \]

What does it mean to think about \(P(model)\)?

Please draw your own subjective distributions for the following events.

  1. The probability that it will snow at Reed this winter.
  2. The probability that, on a given night, the sun has gone super nova.
  3. The total number of individual socks that you own.

Prior distribution

A prior distribution is a probability distribution for a parameter that summarizes the information that you have before seeing the data.

Prior on proportion pairs

Full simulation

head(sock_sim)
##   unique pairs n_socks prop_pairs
## 1      3     4      16      0.970
## 2      7     2      33      0.914
## 3      9     1      51      0.929
## 4      1     4       9      0.955
## 5      9     1      45      0.851
## 6      9     1      21      0.726
sock_sim %>%
  filter(unique == 11, pairs == 0) %>%
  head()
##   unique pairs n_socks prop_pairs
## 1     11     0      49      0.692
## 2     11     0      37      0.873
## 3     11     0      49      0.815
## 4     11     0      62      0.961
## 5     11     0      53      0.974
## 6     11     0      59      0.847

Proportion of pairs

Number of socks

Karl Broman's Socks

The posterior distribution

  • Distribution of a parameter after conditioning on the data
  • Synthesis of prior knowledge and observations (data)

Question

What is your best guess for the number of socks that Karl has?

Our best guess

  • The posterior median is 44 socks.

Karl Broman's Socks

\[ 21 \times 2 + 3 = 45 \textrm{ socks} \]

Summary

Bayesian methods . . .

  • Require the subjective specification of your prior knowledge
  • Provide a posterior distribution on the parameters
  • Have strong intuition
  • Are computationally expensive